AITopics | ml dataset

Collaborating Authors

ml dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

9547b09b722f2948ff3ddb5d86002bc0-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsNov-19-2025, 22:07:59 GMT

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Austria (0.04)
(13 more...)

Genre:

Research Report (0.67)
Questionnaire & Opinion Survey (0.49)
Overview (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Add feedback

9547b09b722f2948ff3ddb5d86002bc0-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 10:20:08 GMT

annotation, croissant, dataset, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Austria (0.04)
(13 more...)

Genre:

Research Report (0.67)
Questionnaire & Opinion Survey (0.49)
Overview (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Add feedback

RecKG: Knowledge Graph for Recommender Systems

Kwon, Junhyuk, Ahn, Seokho, Seo, Young-Duk

arXiv.org Artificial IntelligenceJan-7-2025

Knowledge graphs have proven successful in integrating heterogeneous data across various domains. However, there remains a noticeable dearth of research on their seamless integration among heterogeneous recommender systems, despite knowledge graph-based recommender systems garnering extensive research attention. This study aims to fill this gap by proposing RecKG, a standardized knowledge graph for recommender systems. RecKG ensures the consistent representation of entities across different datasets, accommodating diverse attribute types for effective data integration. Through a meticulous examination of various recommender system datasets, we select attributes for RecKG, ensuring standardized formatting through consistent naming conventions. By these characteristics, RecKG can seamlessly integrate heterogeneous data sources, enabling the discovery of additional semantic information within the integrated knowledge graph. We apply RecKG to standardize real-world datasets, subsequently developing an application for RecKG using a graph database. Finally, we validate RecKG's achievement in interoperability through a qualitative evaluation between RecKG and other studies.

artificial intelligence, machine learning, recommender system, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3605098.3636009

2501.03598

Country:

Europe > Spain (0.16)
Asia > South Korea (0.14)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Anticipating Technical Expertise and Capability Evolution in Research Communities using Dynamic Graph Transformers

Horawalavithana, Sameera, Ayton, Ellyn, Usenko, Anastasiya, Cosbey, Robin, Volkova, Svitlana

arXiv.org Artificial IntelligenceJul-18-2023

The ability to anticipate technical expertise and capability evolution trends globally is essential for national and global security, especially in safety-critical domains like nuclear nonproliferation (NN) and rapidly emerging fields like artificial intelligence (AI). In this work, we extend traditional statistical relational learning approaches (e.g., link prediction in collaboration networks) and formulate a problem of anticipating technical expertise and capability evolution using dynamic heterogeneous graph representations. We develop novel capabilities to forecast collaboration patterns, authorship behavior, and technical capability evolution at different granularities (e.g., scientist and institution levels) in two distinct research fields. We implement a dynamic graph transformer (DGT) neural architecture, which pushes the state-of-the-art graph neural network models by (a) forecasting heterogeneous (rather than homogeneous) nodes and edges, and (b) relying on both discrete -- and continuous -- time inputs. We demonstrate that our DGT models predict collaboration, partnership, and expertise patterns with 0.26, 0.73, and 0.53 mean reciprocal rank values for AI and 0.48, 0.93, and 0.22 for NN domains. DGT model performance exceeds the best-performing static graph baseline models by 30-80% across AI and NN domains. Our findings demonstrate that DGT models boost inductive task performance, when previously unseen nodes appear in the test data, for the domains with emerging collaboration patterns (e.g., AI). Specifically, models accurately predict which established scientists will collaborate with early career scientists and vice-versa in the AI domain.

artificial intelligence, machine learning, scientist, (19 more...)

arXiv.org Artificial Intelligence

2307.09665

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany (0.05)
Europe > Italy (0.05)
(22 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Power Industry > Utilities > Nuclear (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Military (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages

Oliveira, Frederico S., Casanova, Edresson, Júnior, Arnaldo Cândido, Soares, Anderson S., Filho, Arlindo R. Galvão

arXiv.org Artificial IntelligenceJun-16-2023

In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish. Additionally, we provide the YourTTS model, a multi-lingual TTS model, trained using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in English. Our purpose in creating this dataset is to open up new research possibilities in the TTS area for multi-lingual models. The dataset is publicly available under the CC-BY 4.0 license1.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.10097

Country:

South America > Brazil (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report (0.65)

Industry:

Information Technology (0.47)
Media (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.75)
(2 more...)

Add feedback

Cleanlab: Correct your data labels automatically and quickly – Towards AI

#artificialintelligenceJan-11-2023, 18:45:18 GMT

Originally published on Towards AI. I used an open-sourced library, cleanlab, to remove low-quality labels on an image dataset. The model trained on the dataset without low-quality data gained 4 percentage points of accuracy compared to the baseline model (trained on all data). Improving data quality sounds easy enough. But the workload of manually checking data quality can quickly become insurmountable as the dataset scales.

artificial intelligence, dataset, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.30)

Add feedback

Elements of effective machine learning datasets in astronomy

Boscoe, Bernie, Do, Tuan, Jones, Evan, Li, Yunqi, Alfaro, Kevin, Ma, Christy

arXiv.org Artificial IntelligenceNov-29-2022

In this work, we identify elements of effective machine learning datasets in astronomy and present suggestions for their design and creation. Machine learning has become an increasingly important tool for analyzing and understanding the large-scale flood of data in astronomy. To take advantage of these tools, datasets are required for training and testing. However, building machine learning datasets for astronomy can be challenging. Astronomical data is collected from instruments built to explore science questions in a traditional fashion rather than to conduct machine learning. Thus, it is often the case that raw data, or even downstream processed data is not in a form amenable to machine learning. We explore the construction of machine learning datasets and we ask: what elements define effective machine learning datasets? We define effective machine learning datasets in astronomy to be formed with well-defined data points, structure, and metadata. We discuss why these elements are important for astronomical applications and ways to put them in practice. We posit that these qualities not only make the data suitable for machine learning, they also help to foster usable, reusable, and replicable science practices.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.14401

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.17)
Europe > Netherlands (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

cleanlab 2.0: Automatically Find Errors in ML Datasets

#artificialintelligenceApr-22-2022, 01:10:31 GMT

Distributed ML is an active area of work, in both academia and industry, and it has been for some time now. Companies like Google were doing distributed machine learning decades ago. For some use cases, libraries like scikit-learn are totally adequate, while for other use cases, e.g. when using sophisticated models that require a lot of compute to train, training over large datasets that don't fit on a single node, distributed computing is essential. On the topic of data storage: in some cases, system builders do co-design the data storage and data processing, e.g. Such co-design can give performance gains.

cleanlab 2, data processing, ml dataset, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.65)
Information Technology > Data Science > Data Mining > Big Data (0.45)

Add feedback

Play With Your ML Dataset -- Cheatsheet in R

#artificialintelligenceOct-30-2019, 11:06:35 GMT

Understanding data usually is half the battle won. For any machine learning project it helps immensely to analyze your data from different points of view. Summarising a dataset means understanding how your data looks when subjected to simple statistical anaylsis. To illustrate the various techniques let us consider the glass dataset from the r package mlbench. It has 214 observations containing examples of the chemical analysis of 7 different types of glass. For a quick look we try to display the first 10 rows of the data.

dataset, ml dataset, standard deviation, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Add feedback